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1.  INTRODUCT  ION 


In  August  1981,  Revenue  and  Operations  Research 
Branch  asked  Central  Statistical  Services  to  evaluate  the 
sampling  methodology  used  in  gathering  samples,  and  to 
analyse  the  sample  data  already  collected  for  the  Treasury 
Budget  Proposal  regarding  ad-valorem  taxation  of  Gasoline 
and  Tobacco  products.  In  the  Bills  72,  73  and  76,  it  was 
established  in  the  legislation  implementing  the  Treasurer's 
budget  proposals  regarding  ad-valorem  taxation  of  Motor  Fuels 
and  Tobacco  products,  that  the  taxable  price  to  which  the 
tax  rates  are  to  be  applied,  would  be  based  on  the  median 
price  obtained  by  the  Minister  of  Revenue  from  such  periodic 
sampling  of  retail  price  as  he  considers  appropriate. 

Staff  in  the  Ministry  of  Revenue,  with  the  con¬ 
currence  of  Treasury,  had  established  that  the  sample,  each 
quarter,  would  be  drawn  from  the  area  of  Ontario  bounded  by 
Oshawa  on  the  east,  Barrie  on  the  north,  Kitchener  on  the 
west  and  Niagara  Falls  on  the  south.  The  collected  data  from 
the  sample  surveys  was  given  to  Central  Statistical  Services 
in  late  September  for  analysis. 


2.  PURPOSE 

There  are  two  basic  objectives  of  this  study: 
i)  To  evaluate  the  sampling  technique  used  by  the  Ministry 
of  Revenue  to  collect  sample  data; 
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ii)  Based  on  the  analysis  of  survey  data  collected  by  the 
Ministry  of  Revenue,  recommend  improvements  in  sampling 
methodology  and  estimation  procedures  of  the  taxable  price 
of  Motor  Fuels  and  Tobacco  products. 

3.  SUMMARY  AND  CONCLUSIONS 

.  Presently,  the  Ministry  of  Revenue  is  collecting  data  by 
judgement  sample.  Uniformed  fuel  tax  inspectors  collect 
data  following  the  predetermined  routes  in  the  area  specified 
in  this  study.  Judgement  sampling  permits  the  sampler  to 
select  any  sample  and  the  probability  of  this  selection  is 
unknown.  There  is  no  assurance  that  the  data  collected 
represents  the  target  population.  In  addition,  certain 
errors  may  be  introduced  which  cannot  be  easily  measured, 
e.g.  bias.  In  this  study,  we  do  not  advise  continuation  of 
judgement  sampling;  we  advise  the  selection  of  a  probability 
sample  determined  by  statistical  sampling  method,  lest  the 
methodology  be  indefensible. 

.  Presently,  the  selected  sample  sizes  (500  to  600)  for  the 
survey  are  too  large;  we  advise  that  sample  size  be  selected 
according  to  accuracy  and  confidence  level  needed  in  the 
results.  This  will  result  in  cost  savings. 

.  Presently,  only  the  median  is  calculated;  we  advise  that 
all  basic  statistics  should  be  determined  and  sampling 
errors  should  be  determined  for  the  sample  data.  After 
analysis  of  data,  we  have  found  that  there  is  no 
significant  difference  among  the  arithmetic  mean,  median 
and  mode.  Since  the  mean  and  median  are  the  same,  the 
arithmetic  mean  can  be  used  to  discuss  errors  and  confidence 
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limits  of  the  population  average. 

.  Presently,  no  error  analysis  is  performed  to  indicate 
confidence  in  the  results.  It  is  necessary  to  estimate 
sampling  errors  and  confidence  limits  in  the  survey  data. 
(This  allows  the  Ministry  of  Revenue  to  answer  the 
fundamental  question:  How  good  is  the  estimate?) 

.  Since  the  variability  in  the  sample  data  is  small,  the 
survey  data  indicates  the  following: 

-  the  data  observations  of  all  variables  (Motor  Fuels 
and  Tobacco  products)  are  quite  homogeneous; 

-  the  geographic  variation  is  very  small  between  Metro 
and  outside  Metro; 

-  there  is  no  significant  difference  between  estimated 
prices  in  Metre  and  outside  Metro; 

-  the  arithmetic  mean  of  various  routes  shows  no 
significant  difference  between  them. 


4.  RECOMMENDATIONS 

The  following  recommendations  are  made  to  improve 

the  survey  methodology  and  estimation  procedures: 

i)  It  is  highly  recommended  that  recognized  sampling 
procedures  be  used  for  sample  survey  of  Motor  Fuels 
and  Tobacco  products,  i.e.,  systematic  random  sampling, 
stratified  or  cluster  sampling  with  randan  selection. 

ii)  Sample  size  for  the  sample  survey  should  be  selected 
to  reflect  accuracy  and  confidence  level  required  in 
the  final  results. 

iii)  It  is  highly  recommended  that  error  analysis  should 

be  performed  and  confidence  limits  should  be  determined 
for  the  estimation  of  population  average. 


' 
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5.  DATA  ANALYSIS  OF  SAMPLE  DATA 

The  sample  data  (first  quarter)  collected  by  the 
Ministry  of  Revenue  was  analyzed  in  detail: 

A.  Estimation  of  Average  Prices 

In  Tables  1  and  2,  the  median,  mode  and  arithmetic  mean 
are  estimated  and  compared.  The  sample  data  indicates 
that  there  is  no  significant  difference  between  the 
arithmetic  mean,  median  and  mode.  It  simply  means  that 
the  prices  of  various  grades  are  normally  distributed. 

In  general,  the  prices  of  the  Motor  Fuel  products  will  be 
normally  distributed,  hence  the  arithmetic  mean  could  be 
used  to  calculate  errors  and  confidence  limits.  The 
median  is  a  poor  estimator  of  average,  since  basic  statist¬ 
ical  measurements  cannot  be  calculated  with  the  median. 

B.  Statistical  Parameters  of  Sample  Data 

Table  3  provides  estimates  of  statistical  parameters 
which  are  needed  to  estimate  the  accuracy  and  precision 
of  estimates  of  the  average  price  of  various  grades  of 
gasoline . 

C.  Confidence  Limits 

Table  4  determines  population  mean  of  various  grades  of 
Motor  Fuel  and  provides  confidence  limits.  It  should  be 
noted  that  the  estimates  at  95%  confidence  level  are  very 
precise.  This  is  in  part  due  to  the  large  sample  size  and 
partly  due  to  a  good  judgement  sample. 

D.  Optimum  Sample  Size 

Table  5  provides  estimation  of  optimum  sample  sizes  at 
various  levels  of  tolerated  errors,  e.g.,  if  accuracy  to  half 
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a  cent  is  required  for  the  estimation  of  average  price 
with  95%  confidence  level,  sample  sizes  in  Column  6  will 
provide  the  required  results.  Similarly,  Column  9  will 
provide  sample  sizes  to  achieve  accuracy  up  to  1/5  of  a 
cent. 

E.  Analysis  of  Tobacco  and  Cigarette  Data 

Tables  6,  7,  8,  9  and  10  provide  information  on  estimation 
of  average  price,  statistical  parameters  of  sample,  confidence 
limits  and  optimum  sample  size  as  described  above  in  A,  B, 

C  and  D. 

F.  Metro  Sample  vs  Outside  Metro  Sample 

Data  collected  in  Metro  and  outside  Metro  was  analyzed  for 
comparison  purposes.  Table  11  includes  statistical  para¬ 
meters  for  all  fuel  variables,  tobacco  and  cigarettes. 

As  can  be  seen,  there  is  no  significant  difference  in 
average  prices  between  Metro  and  outside  Metro. 

G.  Comparison  of  average  prices  of  various  routes 

Table  12  provides  the  arithmetic  mean  by  route  of  various 
grades  of  fuel.  There  is  no  significant  difference  in 
average  prices  of  various  grades  of  fuel  between  routes. 

The  average  price  of  fuel  in  various  routes  is  almost  the 
same  as  the  mean  price  of  the  population  mean. 

Statistically  there  is  no .difference  in  these  averages. 


- 


7 


Table  1 

Three  Different  Average  Prices  from  Sample 


Grade  of  Motive  Fuel 

Arithmetic  Mean 

The  Median 

The  Mode 

Regular  Leaded 

27.7? 

27.2c 

27.2c 

Regular  Unleaded 

29.9 

29.3 

29.2 

Premium  Leaded 

31.2 

31.5 

29.4 

Premium  Unleaded 

31.1 

30.4 

30.2 

Diesel 

26.3 

26.0 

 - 

25.9 

Table  2 

Comparison  Among  Three  Different  Average  Prices 


Grade 

Mean-Median 

Mean -Mode 

Median-Mode 

Regular  Leaded 

0.5c 

0.5c 

0 

Regular  Unleaded 

0.6 

0.7 

0.1 

Premium  Leaded 

-0.25 

1.8 

2.1 

Premium  Unleaded 

0.7 

0.9 

0.2 

Diesel 

0.3 

0.4 

0.1 

. 
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Table  3 

Estimation  of  Statistical  Parameters  from  Sample  Data 


Grade  of 
Motive  Fuel 

Sample 

n 

Mean 

s 

Variance 

X 

Standard 

Deviation 

S- 

X 

Coefficient 
of  Variation 
S-/  - 

X  X 

Standard 

Error 

S-  /ft 

(Unit) 

(Cent) 

(Cent) 

( Cent) 

(Ratio) 

(Cent) 

Regular 

Leaded 

457 

27.7 

2.07848 

1.44187 

5.20418 

.06744 

Regular 

Unleaded 

452 

29.9 

2.36978 

1.53941 

5.15272 

.07240 

Premium 

Leaded 

32 

31.2 

3.58065 

1.89226 

6.06008 

.33450 

Premium 

Unleaded 

379 

31.1 

2.66794 

1.63338 

5.25119 

.09918 

Diesel 

98 

26.3 

2.45773 

1.56772 

5.95443 

.15836 
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Table  4 

Estimation  of  Population  Mean  at  95%  Confidence  Limits 


Grade  of 

Motive  Fuel 

Sample  Mean 

X 

Population  Mean 

X  =  X  ± 

aJ  n 

Estimated 
Population  Mean 

X 

(Cent) 

(Cent) 

(Cent) 

Regular  Leaded 

27.7 

27.7  +  1.96(. 06744) 

27.6  <  X  <  27.8 

Regular  Unleaded 

29.9 

29.9  +  1.96(. 07240) 

29.8  <  X  <  30.0 

Premium  Leaded 

31.2 

31.2  +  1.96(. 33450) 

30.5  C  X  31.9 

Premium  Unleaded 

31.1 

31.1  +  1.96(. 09918) 

30. 9  <  X  £  31.3 

Diesel 

26.3 

26.3  +  1.96( .15836) 

26. 0<  X  <  26.6 

X 


x  ±  'Lfi  > 

aH: 


Where  t  =  1.96  at  95%  confidence  probability 
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Table  5 

Estimation  of  Optimum  Sample  Sizes  at  Various  Levels 

of  Tolerated  Errors 


Grade  of 

Motive  Fuel 

Optimum  Sample  Size  n 

-j_  Required 

nl 

n2 

n3 

n4 

n5 

n6 

n7 

n8 

n9 

nio 

lc 

.9 

.8 

.7 

.6 

.5 

.4 

.3 

.2 

.1 

Regular  Leaded 

8 

10 

12 

16 

22 

32 

50 

89 

200 

799 

Regular  Unleaded 

9 

11 

14 

19 

25 

36 

57 

101 

228 

910 

Premium  Leaded 

14 

17 

21 

28 

38 

55 

86 

153 

344 

1,379 

Premium  Unleaded 

10 

13 

16 

21 

28 

41 

64 

114 

256 

1,025 

Diesel 

9 

12 

15 

19 

26 

38 

59 

105 

236 

944 

Where  n_^  =  Optimum  Sample  Size  required 

t  =  1.96  at  95%  Confidence  Level 
2 

S-  =  Variance  and  S-  =  Standard  Deviation, 
x  x 

and 

e  =  tolerated  error  in  terms  of  cent  per  litre 
1,  .9,  .8,  .7,  .6,  .5,  .4,  .3,  .2,  .1. 
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Table  6 

Three  Different  Average  Prices  from  Sample 


Tobacco  and 
Cigarettes 

Arithmetic 

Mean 

The  Median 

The  Mode 

Tobacco 

$1.41 

$1.43 

$1.45 

Cigarettes  - 
All  sizes 

1.07 

1.05 

1.05 

Regular  size 

1.06 

1.05 

1.05 

King  size 

1.07 

1.05 

1.05 

100 1  s 

1.08 

1.05 

1.05 

Table  7 

Comparison  Among  Three  Different  Average  Prices 


Tobacco  and 
Cigarettes 

Mean-Median 

Me  an -Mode 

Median -Mode 

Tobacco 

-$0.02 

-$0.04 

-$0.02 

Cigarettes  - 
All  sizes 

CM 

O 

• 

.02 

0 

Regular  size 

.01 

.01 

0 

King  size 

.02 

.02 

0 

100's 

.03 

.03 

0 
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Table  8 

Estimation  of  Statistical  Parameters  from  Sample  Data 


Tobacco  and 
Cigarettes 

Sample 

n 

Mean 

X 

Variance 

s2- 

X 

Standard 

Deviation 

S  - 

X 

Coefficient 
of  Variation 

S  -  /  - 

X  X 

1 

Standard 

Error 

s  -/ffk 

x  ’ 

Unit 

$ 

$ 

$ 

Ratio 

$ 

Tobacco 

556 

1.41 

.0208921 

.144541 

10.2338 

1 

.0061299 

Cigarettes  - 
All  sizes 

1,684 

1.07 

.00464539 

.0681571 

6.37134 

.00166089 

Regular  size 

592 

1.06 

.0031255 

.0559062 

5.26544 

.00229773 

King  size 

589 

1.07 

.00503134 

.070932 

6.61331 

.0029227 

I 

100's 

503 

1.08 

.00587931 

.0766767 

7.12711 

.00341884  | 
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Table  9 

Estimation  of  Population  Mean  at  95%  Confidence  Limits 


Tobacco  and 
Cigarettes 

Sample  Mean 

X 

Population  Mean 

X  —  x  +  t  (  Jhc) 

- 

Estimated 

Population 

Mean 

($) 

($) 

($) 

Tobacco 

1.41 

1.41  +  1.96( .0061299) 

1 . 4°  £  X  <  1.42 

Cigarettes  - 
All  sizes 

1.07 

1.07  +  1.96(. 00166089) 

1.06  <  X  £  1.07 

Regular  size 

1.06 

1.06  +  1.96(. 00229773) 

1.05  <  X  <  1.07 

King  size 

1.07 

1.07  +  1. 96(. 0029227) 

1.06  <  X  <  1.08 

100's 

1.08 

1.08  +  1.96(. 00341884) 

1.07  <  X  <  1.09 

X  =  x  + 


) 


Where  t  ~  1.96  at  95%  confidence  probability. 
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Table  10 


Estimation  of  Optimum  Sample  Sizes  at  Various  Levels 

of  Tolerated  Errors 


Optimum  Sample  Size  n 

Required 

Tobacco  and 
Cigarettes 

nl 

$.1 

n2 

.09 

n3 

.08 

n4 

.07 

nr 

0 

.06 

n6 

.05 

n7 

.04 

n8 

.03 

n9 

.02 

o 

i — i  i — i 

d  o 

Tobacco 

Cigarettes 

8 

10 

13 

16 

22 

32 

50 

89 

201 

803 

All  sizes 

2 

2 

3 

4 

5 

7 

11 

20 

45 

179 

Regular  size 

1 

1 

2 

2 

3 

5 

8 

13 

30 

120 

King  size 

2 

2 

3 

4 

5 

8 

12 

21 

48 

193 

100's 

2 

3 

4 

5 

6 

_ 

9 

14 

25 

56 

225 

n .  =  t 

i 


^)2 


Where  n_^  =  Optimum  Sample  Size  required 

t  =  1.96  at  95%  Confidence  Level 
2 

S-  =  Variance  and  S-  =  Standard  Deviation 
x  x 

e  =  tolerated  error  in  terms  of  dollar  per  unit 


$.1,  .09,  .08,  .07,  .06,  .05,  .04,  .03,  .02,  .01. 


Comparison  of  Statistical  Variables  and  Parameters  Between  Metro  Toronto  and  Outside  Metro 
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Table  12 


The 

Arithmetic 

Mean  by  Route 

Number  and 

Grade  of  Fuel 

Route  Number 

Diesel 

Regular 

Regular 

Premium 

Premium 

Leaded 

Unleaded 

Leaded 

Unleaded 

Route  1 

26.4c 

28.0c 

30.1c 

32.5c 

31.4c 

Route  2 

25.9 

28.0 

30.1 

31.3 

31.3 

Route  3 

25.7 

27.7 

29.9 

30.8 

31.1 

Route  4 

25.8 

27.0 

29.1 

29.0 

30.6 

Route  5 

26.4 

N/A 

N/A 

N/A 

N/A 

Route  6 

26.8 

28.3 

30.0 

N/A 

31.7 

Route  7 

27.5 

27.7 

30.0 

29.9 

31.0 

Sample  Mean 


by  Route 

26.35 

27.78 

29.86 

30.70 

31.18 

Sample  Mean 

by  Total 

26.30 

27.70 

29.90 

31.20 

31.10 

Route  -  Total 

0.05C 

0.08c 

-0.04c 

-0.50c 

0.08 

A«nscns 
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